Skip to content

Conversation

@NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Oct 16, 2025

Currently in our BETA CUDA Interface, we error loudly if we find that NVDEC doesn't support the input video.

In this PR, we do not error loudly anymore: we fall back to the CPU. This aligns the behavior of the BETA Interface with that of the FFmpeg interface, since FFmpeg was doing the fallback internally already (see #943).

There are some key design choices and which are better explain in context, so I'll leave some comments below.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 16, 2025
@NicolasHug NicolasHug marked this pull request as ready for review October 17, 2025 11:25
return decoder->frameReadyInDisplayOrder(dispInfo);
}

static UniqueCUvideodecoder createDecoder(CUVIDEOFORMAT* videoFormat) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All checks below have been moved to the new bool nativeNVDECSupport(const SharedAVCodecContext& codecContext);

return std::nullopt;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same checks as before, but with a crucial difference: the source of truth that we use to get the stream properties is now FFmpeg's AVCodecContext. Previously, it was NVCUVID's CUVIDEOFORMAT.
I think that's fine. I can't imagine NVCUVID being more accurate than FFmpeg.

}
return;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above is essentially the same as the FFmpeg interface logic:

if (avFrame->format != AV_PIX_FMT_CUDA) {
// The frame's format is AV_PIX_FMT_CUDA if and only if its content is on
// the GPU. In this branch, the frame is on the CPU. There are two possible
// reasons:
//
// 1. During maybeConvertAVFrameToNV12OrRGB24(), we had a non-NV12 format
// frame and we're on FFmpeg 4.4 or earlier. In such cases, we had to
// use CPU filters and we just converted the frame to RGB24.
// 2. This is what NVDEC gave us if it wasn't able to decode a frame, for
// whatever reason. Typically that happens if the video's encoder isn't
// supported by NVDEC.
//
// In both cases, we have a frame on the CPU. We send the frame back to the
// CUDA device when we're done.
enum AVPixelFormat frameFormat =
static_cast<enum AVPixelFormat>(avFrame->format);
FrameOutput cpuFrameOutput;
if (frameFormat == AV_PIX_FMT_RGB24) {
// Reason 1 above. The frame is already in RGB24, we just need to convert
// it to a tensor.
cpuFrameOutput.data = rgbAVFrameToTensor(avFrame);
} else {
// Reason 2 above. We need to do a full conversion which requires an
// actual CPU device.
cpuInterface_->convertAVFrameToFrameOutput(avFrame, cpuFrameOutput);
}
// Finally, we need to send the frame back to the GPU. Note that the
// pre-allocated tensor is on the GPU, so we can't send that to the CPU
// device interface. We copy it over here.
if (preAllocatedOutputTensor.has_value()) {
preAllocatedOutputTensor.value().copy_(cpuFrameOutput.data);
frameOutput.data = preAllocatedOutputTensor.value();
} else {
frameOutput.data = cpuFrameOutput.data.to(device_);
}
return;
}

There are still minor differences that we should eventually align, as part of #943

# https://github.com/meta-pytorch/torchcodec/pull/977
# We need to define a better way to assert which backend a decoder
# is using.
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to deactivate this test for now, we know it's working fine.

I think we should have a private way to query the VideoDecoder object so that it can tell us which interace / backend it's currently using. I will follow-up on this.

dec.get_frame_at(0)
beta = VideoDecoder(H265_VIDEO.path, device="cuda").get_frame_at(0)

torch.testing.assert_close(ffmpeg.data, beta.data, rtol=0, atol=0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above: we used to assert an error, now we just assert that the video can be decoded properly. Strictly speaking, we're not asserting that there is a fallback. (That will probably come in follow-ups).

There's another important thing to check, that we're actually not enforcing in the tests: we need to make sure that we don't have false positives, i.e. that we're not falling back to the CPU when we don't need to. To check that, I added a TORCH_CHECK(false) in the CPU fallback path. All the tests are passing, except for these two:

FAILED test/test_decoders.py::TestVideoDecoder::test_get_key_frame_indices[cuda:beta] - RuntimeError: Falling back to CPU decoder.
FAILED test/test_decoders.py::TestVideoDecoder::test_beta_cuda_interface_cpu_fallback - RuntimeError: Falling back to CPU decoder.

This is the expected result: these two tests are using H265_VIDEO, which is the only video that needs a fallback.

I will try to follow-up with a an official and robust way to assert the backend / fallback of the VideoDecoder object (see other comment below)

// We'll always use the CPU fallback from now on, so we can return early.
return;
}

Copy link
Contributor Author

@NicolasHug NicolasHug Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above: we create the fallback CPU interface only if we need it.

I'm a bit sad that the BETA Interface needs to store a CPU interface. I have TODOs in other places basically saying "An interface shouldn't rely on another interface".

But, I'm not sure we have a better way to implement this fallback mechanism, so I think our principle should be something less strict along those lines: "It's OK for interface A to use interface B, but interface B shouldn't need to worry about being called by interface A". I think we're respecting this principle so far.

The main design alternative would be for the fallback to be implemented within SingleStreamDecoder rather than in the interface itself. But I'm not sure we want this.
Specifically for this fallabck, we'd have to create a third hybrid interface (the fallback one), that runs all the decoding on the CPU, but runs the color conversion (convertAVFrameToFrameOutput) on GPU. Sounds like a hassle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It's OK for interface A to use interface B, but interface B shouldn't need to worry about being called by interface A". I think we're respecting this principle so far.

Agreed and agreed.

In my previous refactoring of the FFmpeg CUDA device interface, I believe I had always created the CPU device for fallback. But in that device, I also think we only knew if we would need the fallback when we discovered that FFmpeg had decoded on the frame on the CPU and then we needed to do CPU color conversion. If we could discover earlier in the FFmpeg CUDA device interface, then doing this pattern would be good. But I doubt we can

} else {
frameOutput.data = cpuFrameOutput.data.to(device_);
}
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this dance in at least one other place in the FFmpeg CUDA device. Maybe a part of #943 could also be trying to pull this into a utils function.

@NicolasHug NicolasHug merged commit 1befed7 into meta-pytorch:main Oct 20, 2025
58 checks passed
@NicolasHug NicolasHug deleted the fallback-better branch October 20, 2025 16:25
NicolasHug added a commit to NicolasHug/torchcodec that referenced this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants